A Two-level Morphological Analyser and Generator for Irish using Finite-State Transducers
نویسنده
چکیده
Computational morphology is an important part of natural language processing. Finite-state techniques have been applied successfully in computational phonology and morphology to many of the world’s major languages. Celtic languages such as Modern Irish present challenging morphological features that to date have not been addressed using finite-state technology. This paper presents a finite-state two-level morphology of Irish developed using Xerox Finite-State Tools. The system encodes the inflectional morphology of all inflected parts-of-speech in Modern Irish. The morphotactics of stems and affixes are encoded in the lexicon and word mutations are implemented as a series of replace rules encoded as regular expressions. Both the lexicons and rules are compiled into finite state transducers and combined to produce a single lexical transducer for the language. A major advantage of finite-state two-level implementations of morphology is their inherent bi-directionality; the same system is used for both analysis and generation of word forms in the language. This resource can be used as a component part in many NLP applications such as spelling checkers/correctors, stemmers, and text to speech synthesisers. It can also be used in tokenising, lemmatising and part-of-speech tagging of a corpus of text. The system, which is designed for broad coverage of the language, is evaluated against the most frequently used words in a corpus of contemporary Irish texts. Finally, possible extensions to the system are suggested, such as derivational morphology and the inclusion of dialectal or historical word-forms.
منابع مشابه
A Two-Level Morphological Analyser for the Indonesian Language
This paper presents our efforts at developing an Indonesian morphological analyser that provides a detailed analysis of the rich affixation process. We model Indonesian morphology using a two-level morphology approach, decomposing the process into a set of morphotactic and morphophonemic rules. These rules are modelled as a network of finite state transducers and implemented using xfst and lexc...
متن کاملA Morphological Analyser for Machine Translation Based on Finite-state Transducers
A finite-state, rule-based morphological analyser is presented here, within the framework of machine translation system TAVAL. This morphological analyser introduces specific features which are particularly useful for translation, such as the detection and morphological tagging of word groups that act as a single lexical unit for translation purposes. The case where words in one such group are ...
متن کاملComparing nondeterministic and quasideterministic finite-state transducers built from morphological dictionaries
This paper describes a comparison between quasideterministic and nondeterministic finite-state transducers generated from morphological dictionaries containing the vocabulary (lemmas) and the morphological inflection information of a natural language processing application such as the morphological analyser of a machine translation system. Results show that non-deterministic transducers are mor...
متن کاملFinite-state Relations Between Two Historically Closely Related Languages
Regular correspondences between historically related languages can be modelled using finitestate transducers (FST). A new method is presented by demonstrating it with a bidirectional experiment between Finnish and Estonian. An artificial representation (resembling a protolanguage) is established between two related languages. This representation, AFE (Aligned Finnish-Estonian) is based on the l...
متن کاملMorphological Features of the Irish Universal Dependencies Treebank
The Universal Dependencies Project1 (Nivre, [9]; Nivre et al., [10]) is an ongoing effort towards creating a set of harmonised dependency treebanks that are annotated and structured according to universal guidelines. This paper reports on the addition of morphological features to the Irish Universal Dependencies Treebank (IUDT). Our feature set subscribes to the feature inventory of the UD Proj...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002